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1. Introduction 


Electrical power generation is increased caused by population 
growth and its subsequent aggressive electrical energy demands | 1]. 
Thermal pollution is increased and greenhouse gases are produced 
more, due to the growth of electrical energy generation resulting 
from thermal power plants. It causes more interest in power 
generation based on renewable energies [2]. Electrical power gen- 
eration based on wind energy has been fastest growing among the 
renewable energy sources [3]. It is estimated that in 2020, about 12% 
of the world electrical energy will be supplied from wind energy [4]. 
Therefore, the electricity generated by wind power will play an 
important role in electricity supply. 

Wind power depends on weather conditions such as wind 
speed, wind direction, temperature, air pressure and environmen- 
tal obstacles. As a dynamic system, wind power has a correlation 
with its past values at any time, as well [3]. Due to the dependence 
of wind power on the atmospheric parameters, it has been 
recognized as a non-dispatchable source [5]. This feature intro- 
duces a wind power as an uncertain variable and reduces the 
system reliability. Therefore, an accurate prediction of wind power 
variations can moderate this problem to some extent [6,7]. 

Wind power prediction based on meteorological variables is 
encountered with some difficulties. That is, sufficiently accurate 
measurements of meteorological variables are commonly unavail- 
able and their measurement equipments are so expensive to be 
supported, elsewhere. Inaccurate measurements or estimations 
can, on the other hand, results in aggressive errors in the wind 
power forecasting. As another fact, the true model of the wind 
power generating unit is not in hand, commonly. Therefore, 
achieving a low wind power forecasting error via a relatively 
simple black-box model with a low number of measurable inputs/ 
input variables is perfectly desired. 

Based on the above discussion, in this paper, wind power 
forecasting based on its historical data as the forecasting model 
inputs is considered. That is, the optimal training of neural 
networks is proposed as our modeling approach and four seasonal 
wind power data sets of Alberta, Canada [8] wind farm are studied 
as the real data for model construction and evaluation. In order to 
construct the neural network model for forecasting of the wind 
power, at first, time series analysis is performed based on 
recurrence plots and correlation analysis to the available wind 
power time series. In the next stage, a comparative study is carried 
out among various neural networks trained by imperialist compe- 
titive algorithm (ICA) [9], genetic algorithm (GA) [10], and particle 
swarm optimization (PSO) [11,12] approach. The simulation results 
are representative of out-performance of ICA in tuning the neural 
network for wind power forecasting. 

This paper is organized as follows. In Section 2, the related 
researches are introduced. In Section 3, the data properties and the 
input selection approach is described. In Section 4, the proposed 
wind power prediction engine is presented. In Section 5, design 
and evaluation of the forecasting models for the wind power time 
series of Alberta, Canada are described. Finally, Section 5 concludes 
the paper. 


2. The related researches 


Wind power forecasting methods can be categorized as the 
physical and time series or statistical models [13,14]. In the 
physical modeling, someone tries to estimate the wind speed time 
series taking into account the physical characteristics of the 
environment conditions [15]. The statistical model is attempted 
to find a relationship between the parameters of the historical 
data to predict the future wind speed and wind power [16]. 


Commonly, physical models are used for long-term prediction 
and statistical model are used for short-term prediction [17]. 

In the literature, there are different attempts for short-term 
wind power forecasting via hybrid time series methods. In [18], 
wind power prediction has been done via a composition of 
modified hybrid neural network and enhanced particle swarm 
optimization algorithm. In [19], wavelet transform support vector 
machine in conjunction with statistic-characteristics analysis 
has been employed for short-term wind power prediction. In 
[20], a method has been presented to improve the short-term 
wind power prediction at a given turbine using information from 
numerical weather prediction and from multiple observation 
points. In this paper, the prediction of wind power is achieved 
in two stages; in the first stage wind speed is predicted using the 
proposed method. In the second stage, the wind speed to output 
power conversion is accomplished using power curve model. In 
[21], a useful model based on wavelet transform, chaotic time 
series and the GM (1,1) method has been presented for wind farm 
power forecasting. A new approach based on clustering has been 
proposed in [22] and in [23], the ultra-short term prediction of 
wind power based on chaotic time series has been considered. 
Artificial neural networks (ANN) optimized by Tabu search algo- 
rithm [24], hybrid PSO-ANFIS approach [25], wind farm power 
generation based on fuzzy modeling [26], and a hybrid strategy of 
short term wind power prediction based on the physical strategy 
and ANN technique [27] have been addressed in the literature as 
well. Besides, comprehensive reviews about the methods and 
models of wind power may be found in [28-30]. 


3. The data properties and selection of appropriate input set 


As stated earlier, in this paper, the prediction of wind power 
experimental data from Alberta, Canada wind farm [8] is consid- 
ered. The available data are four seasonal data sets for year 2007, 
each one containing 1368 hourly stored data. The wind power is 
predicted using feed-forward neural networks trained by some 
optimization algorithms being ICA, GA and PSO. In the feed- 
forward neural networks, the outputs at any moment only depend 
on the neural weights and the input signals to the neural network 
at that moment. Therefore, proper selection of inputs is essential 
to obtain good performance of the trained neural network. To do 
that, in this paper, two stages are followed to determine the neural 
network inputs for each seasonal data set. At the first stage, the 
characteristics and predictability of the wind power time series is 
investigated via recurrence plots. Based on the derived results, in 
the next stage, the correlation analysis is performed to choose 
proper input sets for the four seasonal data sets. 


3.1. The available data and its properties 


Seeking for the proper inputs for our models, in this section the 
experimental data from Alberta, Canada wind farm [8] for year 
2007 will be examined, closely. As mentioned earlier, the available 
data are four seasonal data sets, each one containing 1368 hourly 
stored data. The mentioned data have been shown in Fig. 1(a)-(d). 
As shown in this figures, severe fluctuations is observed in the 
wind power time series while no hallmark of strong periodicity is 
demonstrated. However, such fluctuations may be due to the 
chaotic or stochastic nature of a nonlinear process [31-33]. Since, 
we are interested in predictability, it is important for us to 
distinguish between these two types of processes. This property 
has been closely examined by the authors in [34] via time series 
analysis methods, where the results are representative of stochas- 
tic nature and so short-term predictability of wind power time 
series in short-term time scale. In order for briefly representing 
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Fig. 1. The Alberta wind power time series for (a) the 1st season, (b) the 2nd season, (c) the 3rd season and (d) the 4th season. 


the results in [34] and to get a better view about the behavior and 
characteristics of the underlying system, the Recurrence Plots 
(RPs) of wind power time series are investigated, in this section. 


3.1.1. The fundamentals of recurrence plots 

Recurrence is a fundamental property of dynamic systems, 
which can be exploited to characterize the system's behavior. As a 
powerful tool for the visualization and analysis of the phase space 
trajectory of the experimental time-series, recurrence plot (RP) 
was introduced in the late 1980s by Eckman et al. [35]. It is 
especially useful for finding hidden correlations in highly compli- 
cated data and to determine the stationarity of the time series 
[36]. With RP, one can graphically detect hidden patterns and 
structural changes in data or see similarities in patterns across the 
time series under study [37]. This technique has been successfully 
applied to various fields, such as physiology [38,39], fluid 
dynamics [40], geology [41], economy [42], as well as energy 
market indices [43-45]. In this paper, the RP methodology will be 
applied to analyze the wind speed time series behavior. Especially 
the predictability of the wind time series would be investigated via 
these analyses. 

For deriving an RP, first of all the phase space of signal must be 
reconstructed via say “method of delays [46]”. RPs visualize the 
behavior of trajectories in phase space [36,47,48] via a graphical 
representation of the matrix: 


Rij=O(e-||Xi— Xj) ii=1,..N (1) 


where, x ; stands for the point in the reconstructed phase space at 
time i, and e is a predefined threshold and ©6(.) is the Heaviside 
function. One assigns a “black” dot to the value one and a “white” 
dot to the value zero. The two-dimensional graphical representa- 
tion of R;j then is called RP [47] and can be used to distinguish 
between different dynamic systems. In this context, recurrence 
plot (RP) examines the paths in the state space. Three types of 
systems are recognized based on the obtained curve: (1) Periodic 
systems, (2) Stochastic systems and (3) Chaotic systems [47,48]. 
Periodic systems are marked by parallel lines and non-interrupted 


Table 1 
The embedding delay and embedding dimension for the Alberta wind power time 
series. 


Season # 1 2 3 4 
Embedding delay 5 7 7 6 
Embedding dimension 20 14 16 16 


diametric, where distance between the lines is proportional to the 
period. These diametric lines are also seen in chaotic systems, but 
the lines have been cut and their length is shorter. Also, the 
distance between these lines is irregular. The lengths of lines are 
proportion to the degree of system predictability. RP curves of 
uncorrelated stochastic systems consist of many individual dots 
that their distribution is quite irregular [47,48]. 


3.1.2. The recurrence plot analysis results 

In order to reconstruct the phase space of the wind time series, 
initially, the embedding delay and the embedding dimension of 
the time series must be acquired. The mutual information method 
[46], and the false nearest neighbors method have been used to 
calculate the embedding delay and embedding dimension of the 
fours seasonal wind power time series. These embedding time 
delays and embedding dimensions are expressed at Table 1. The 
RP will be achieved by using the dimensions and delays of these 
time series. 

The RPs of wind power time series are shown in Fig. 2(a)-(d) 
plotted via the CRP toolbox of MATLAB [49] as our tool. Concerning 
these figures it is concluded that for the first and second seasons, 
the short term erratic distribution of recurrence points is repre- 
sentative of strong stochastic nature of the underlying time series 
with mimic predictability. The situation is somehow different in 
seasons 3 and 4, where the recurrence diagonals are longer and 
thus, the predictability would be increased. White ribbons in the 
recurrence plots correspond to transitions in the system dynamics. 
Such dynamic transitions as well as various seasonal properties 
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Fig. 2. The RP of the Alberta wind power time series for (a) the 1st season, (b) the 2nd season, (c) the 3rd season and (d) the 4th season. 


are representative of the seasonality as well as non-stationarity of 
the wind power time series. Therefore, in selecting the inputs for 
the forecasting model, the mimic predictability of the wind time 
series should be taken into account. Besides, since the forecasting 
model inputs are the lagged wind power terms, they should be as 
close as possible to the desired time to compensate for the non- 
stationarity of the dynamics. 

Based on the above discussions, in the following section, the 
correlation analysis is carried out to select the appreciate inputs 
for the forecasting model. 


3.2. Correlation analysis 


Once the mimic predictability of interested wind time series is 
concluded, we should analyze the correlation properties the 
available data to choose the proper model inputs. The plots in 
Fig. 3(a)}d) show the autocorrelation function plot of the seasonal 
wind power data sets. 

In these figures, it is illustrated that the wind power in each 
hour is highly correlated with its lagged values in the same day up 
to a few hours. For the previous days, the correlation decays 
aggressively, which is another hallmark of mimic predictability. 
We adopt a threshold of 0.7 of correlation to select the model 


inputs. This threshold corresponds to 6, 6, 4, and 6 lagged values 
for the four seasons, respectively. 


4. The power prediction engine 


Regarding the high performance of neural networks in model- 
ing of nonlinear dynamics, in this paper, they have been employed 
as our modeling tool for wind power prediction. In this section, we 
shortly review the basics of neural networks and then switch to 
the developed models and their performance. 


4.1. The fundamentals of ANN's 


Neural networks are highly interconnected simple processing 
units designed in a way to model how the human brain performs 
particular task [50,51]. Each of those units, called neurons, forms a 
weighted sum of its inputs, to which a constant term called bias is 
added. This sum is then passed through a transfer function: linear, 
sigmoid or hyperbolic tangent (Fig. 4(a)). 

In a typical ANN, the neurons are organized in a way that 
defines the network architecture. Networks with interconnections 
that do not form any loops are called feed-forward. Recurrent or 
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Fig. 4. (a) Internal structure of a neuron and (b) the structure of an example three layer feed-forward neural network. 


non-feed forward networks in which there are one or more loops 
of interconnections are also used for some kinds of applications 
[52]. Multilayer perceptrons (MLP's) are the best known and most 
widely used kind of ANNs. In these neural networks, neurons are 
arranged in layers: an input layer, one or more hidden layers and 
an output layer. The neurons in each layer may share the same 
inputs, but are not connected to each other. Fig. 4(b) shows the 
architecture of a generic three-layered feed-forward neural net- 
work model. In order to find the optimal network architecture, one 
should evaluate several combinations. These combinations include 
in networks with different number of hidden layers, different 
number of neurons in each layer and different types of transfer 
functions. Typically, the number of neurons in the hidden layer is 
chosen by trial and error. 

In the feed forward neural networks, the output only depends 
on input signals and neural weights at that moment. The activa- 
tion function used in the hidden layers is commonly nonlinear 
transfer functions such as and log-sigmoid function with its output 
in [0, 1] interval, or tan-sigmoid function for penning the input to 


the interval [—1, 1]. The output of a hidden layer is compute as: 


Ni = Wj X1 + Wi2X2 + ...WiRXR + bi (2) 


a; =f(nj), 


where, xp is the Rth input; S is the number of neurons; wip is the 
related weight of the input vector and ith neuron of the hidden 
layer; b; is its bias; and f(.) is the activation function. The output is 
computed in the output layer in the same manner as the hidden 
layers unless the linear transfer function is commonly used in this 
layer as the activation function. 

Forecasting with neural networks involves two steps: training 
and testing. Training of feed forward networks is normally 
performed in a supervised manner. In supervised manner both 
input and outputs are participated in training the network. The 
adequate selection of inputs for neural network training is highly 
influential to the success of training. A learning process in the 
neural network then constructs an input-output mapping by 
adjustment of the weights and biases at each iteration based on 


{=4,2;3,.,.8 (3) 
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the minimization of some error measure between the produced 
and the desired output. Thus, learning entails an optimization 
process. The knowledge acquired by the neural network through the 
learning process is tested by applying new data that it has never seen 
before, called the testing set. The network should be able to general- 
ize and have an accurate output for this unseen data [43]. 

The most common learning algorithm is the back propagation 
algorithm [53], in which the error is propagated back to the input 
in order for adjusting the weights and biases in each layer. 
The standard back propagation learning algorithm is a steepest 
descent algorithm that minimizes the sum of square errors. This 
standard back propagation learning algorithm is not efficient 
numerically and tends to converge slowly [50,53]. An algorithm 
that trains a neural network 10-100 times faster than the usual 
back propagation algorithm is the Levenberg—Marquardt 
algorithm. The Levenberg—Marquardt algorithm is a variation of 
Newton's method [50]. Newton's update for minimizing a function 
V(x) with respect to the input vector x, is given by: 


ee ie Saree 
V(x) = Nay e? (x) (4) 


where, e(x) is the output error vector. The details about the Leven- 
berg-Marquardt algorithm can be found in [43]. This method, 
however, commonly suffers from lack of convergence to the global 
optimum. Therefore, employing a more efficient optimization algo- 
rithm may lead to more accurate response, less forecasting error as 
well as better convergence. Based on the above discussions, in the 
following sections, some optimizations algorithms being imperialist 
competitive algorithm (ICA), genetic algorithm (GA), and particle 
swarm optimization (PSO) approach are employed for training the 
neural network for forecasting the wind power time series from 
Alberta, Canada and the results are compared for this case. 


4.2. The trainer unit 


In this section the optimization algorithms employed for training 
the forecasting neural network models are briefly introduced. 


4.2.1. Imperialist competitive algorithm 

Imperialist competitive algorithm (ICA) is a new optimization 
technique that is inspired by imperialism countries competing 
social and political processes. ICA has shown its outstanding ability 
for the various problems [54-57]. This algorithm is initially started 
with N Clooney in which, Nimp is the best one (country with the 
lowest cost) which is selected as imperialisms. In [58], ICA pseudo- 
code is described as follows: 


i. Selection of the random locations of the function and initi- 
alize the empires. 

ii. Moving the colonies toward their related imperialist (absorption 
policy or assimilation) according to predetermined assimilation 
coefficient (6>1) and assimilation angle coefficient (y), which 
determine the angle and amount of movement. 

iii. Changing randomly the location of colonies (revolution). 

iv. Until the cost of colony is less than the imperialist, it remains 
in the empire and changes its location relative to imperialist. 

v. Uniting the empires with the same conditions. 

vi. Calculating the total cost of all empires via: 


Totalcostofempire = Costofimperialist + ¢ 
x mean(costofallcolonies) (5) 


where, ¢ is a constant and mean(.) stands for the average of its 
arguments. 

vii. Selecting the weakest colony (colonies) from the weakest 
empires and put it (them) in one of the empires (colonial 
competition). 


viii. Destroying the weak empires. 
ix. If the preset conditions satisfied, it will stop, otherwise 
return to 2. 


4.2.2. Genetic algorithm 

A genetic algorithm emulates biological evolution to solve 
optimization problems. It is formed by a set of individual elements 
(the population) and a set of biological inspired operators that can 
change these individuals. According to evolutionary theory only 
the individuals that are the more suited in the population are 
likely to survive and to generate off-springs, thus transmitting 
their biological heredity to new generations. 

In computing terms, genetic algorithms map strings of num- 
bers to each potential solution. Each solution becomes an indivi- 
dual in the population, and each string becomes a representation 
of an individual. There should be a way to derive each individual 
from its string representation. The genetic algorithm then manip- 
ulates the most promising strings in its search for an improved 
solution. The algorithm operates through a simple cycle [10]: 


i. Creation of a population of strings. 
ii. Evaluation of each string. 
iii. Selection of the best strings. 
iv. Genetic manipulation to create a new population of strings. 


4.2.3. Particle swarm optimization algorithm 

Particle swarm optimization (PSO) is a method for performing 
numerical optimization without explicit knowledge of the gradient 
of the problem to be optimized. PSO is originally attributed to 
Kennedy, and Eberhart was first intended for simulating social 
behavior [12]. The algorithm was simplified and it was observed to 
be performing optimization. PSO is an efficient population based 
optimization technique, which is appropriate for non-convex 
optimization problems [11,12]. In general, the velocity update of 
the ith particle at the k+1th iteration is expressed as [11]: 


k+1 k k_yk k 
vi =W x VÍ + C1 x Ty x (Př—XŤ) + C2 X Tz x (Pg—Xi) (6) 


k+1 k k+1 
xX =X +V (7) 


1 

where, in Eq. (6), vf is the velocity of the ith particle at the kth 
itertaion,p, is the swarm's best known position, w is the inertia 
weight, c1, C2 are the learning factors, and xk is the position of the 
ith particle at the kth iteration. In Eq. (6), the first term provides 
the necessary momentum for particles to roam across the problem 
space. The second is the cognitive component that represents the 
individual experience of each particle. The second component 
encourages the particles to move toward their own best positions 
reached. The last component is the social collaboration of the 
particles in finding the global optimal solution. The particles are 
pulled toward the global best particle reached. Finally, the position 
of the ith particle is updated by Eq. (7) [11]. 


4.3. Evaluation indices 


As stated earlier, for the evaluation of the ANN's performance, a 
testing set containing new input data that it has never seen before 
is applied to the trained network. The performance of the trained 
network is then evaluated by comparison of the network output 
with its actual value. There are some statistical evaluation indices 
which are commonly used to judge about an ANN's performance. 
Let A; and P; be the actual and network output, respectively, 
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related to ith input vector, where N is the number of points in the 
testing set. Then the evaluation indices are defined as [43]: 


© Mean absolute error (MAE): 


T N 
MAE = x È |Pi-Ail (8) 
ist 


© Root mean square error (RMSE): 


1 [N 
> (Pi-Ai)? (9) 
Viza 


© Mean absolute percentage error (MAPE): 


1 N |P;—A;] 
MAPE = — rit 
N: Aj 


RMSE = 


zi 


x 100% (10) 


© Modified mean absolute percentage error (Modified_MAPE): In 
the relationship in Eq. (10) if the actual value is large and its 
prediction becomes small, the computed relative error will 
become near 100%. On the other hand if the actual value is 
small, the relative error may become very large even though 
the difference is small. In this case, the relationship in Eq. (10) 
is modified in this manner. At first, the average of actual output 
values is computed as: 


A LS A 
wi 
and then, the Modified_MAPE will be computed as [59]: 


Pa _ 1 AN |Pi-Ail 
Modified_MAPE = Nj Ay 


x 100% (11) 


e Modified peak absolute percentage error (Modified_PAPE): 


|Pi—Ail 
Av 


Modif ied_PAPE = Max( ) x 100% forl<i<N (12) 


5. Design and evaluation of the forecasting models 
5.1. Input selection 


The multi layer perceptron feed forward neural network with 
two hidden layers is proposed in this paper for short-term 
forecasting of the wind power time series. Based on the performed 
analyses and considering the short-term predictability of the wind 
time series as well as its non-stationarity and seasonality, four 
separate neural network models has been synthesized in order for 
forecasting the Alberta wind time series in each season. In order to 
select the appreciate inputs, a threshold of 0.7 has been considered 
to determine the correlated lagged data as the network inputs. The 
lagged data and the underlying correlations have been presented 
in Tables 2-5, corresponding to the auto-correlation graphs shown 
in Fig. 3. In these tables, WP;, i=1,..,4, stands for the wind power 


Table 2 
Selected inputs, WP,(t), for the Alberta wind power time series, 1% Season. 


Rank Selected Auto- Rank Selected Auto- 
inputs correlation inputs correlation 

1 WP,(t—1) 0.976 4 WP,(t—4) 0.826 

2 WP,(t—2) 0.933 5 WP,(t—5) 0.766 

6 WP,(t—6) 0.706 


3 WP,(t—3) 0.881 


Table 3 
Selected inputs, WP,(t), for the Alberta wind power time series, 2"! Season. 


Rank Selected Auto- Rank Selected Auto- 
inputs correlation inputs correlation 
1 WP.(t—1) 0.973 4 WP.(t—4) 0.831 
2 WP.(t—2) 0.928 5 WP2(t—5) 0.781 
3 WP.(t—3) 0.881 6 WP2(t—6) 0.733 
Table 4 


Selected inputs, WP3(t), for the Alberta wind power time series, 3rd season. 


Rank Selected Auto- Rank Selected Auto- 
inputs correlation inputs correlation 

1 WP3(t—1) 0.956 3 WP;(t—3) 0.797 

WP;(t—4) 0.712 


2 WP3(t—2) 0.88 4 


Table 5 
Selected inputs, WP,(t), for the Alberta wind power time series, 4th season. 


Rank Selected Auto- Rank Selected Auto- 
inputs correlation inputs correlation 

1 WP,(t—1) 0.973 4 WP,(t—4) 0.831 

2 WP,(t—2) 0.929 5 WP,(t—5) 0.779 

WP,(t—6) 0.726 


3 WP4(t—3) 0.881 6 


time series of season i. Based on these results, the wind power of 
1 to 6 hours before the desired hour has been considered as the 
neural network models' inputs for the first, second and fourth 
seasons, while it drops to the 1-4h ago for the third season. 

In order to find the optimal network input set, several correla- 
tion thresholds were evaluated. Amongst, the selected threshold 
and so the selected input set, in one hand, considers the correla- 
tion properties of the available data and, on the other hand, 
implies a proper convergence rate. 


5.2. Network configuration 


As stated earlier, in training an ANN, the number of hidden 
layers, and the number of the neurons of each layer affect the 
prediction precision and training rate, considerably. Therefore, in 
order to find the optimal network architecture, several combina- 
tions of inputs were evaluated. These combinations included 
networks with different number of hidden layers, different num- 
ber of neurons in each layer and different types of transfer 
functions. We converged to a configuration consisting of two 
hidden layers and number of neurons as: 6 for input layer for 
the first, second and fourth seasons, and 4 for the third season, 
7 and 5 for hidden layers and 1 for output layer. All of the input 
data were normalized between —1 and 1. Based on this normal- 
ization, the transfer function for input and hidden layer neurons 
has been selected as a tan-sigmoid transfer function, defined by: 


1-e* 


SO= Tex 


(11) 


The linear transfer function is also used in the neurons of 
output layer. For training the network, the neural network toolbox 
of MATLAB [60] was selected due to its flexibility and simplicity 
[9]. The cost function of Eq. (4) is considered as the training index, 
and ICA, GA, and PSO have been employed to find the optimal 
network weights to minimize the cost function. The corresponding 
properties and parameters of the optimization approaches have 
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been brought in Table 5. In order for training of each seasonal 
network, 1200 data of each wind power data set have been 
considered for training and 168 data have been used for 
evaluation. 


5.3. Evaluation results 


In this section, the performance of the proposed prediction 
engine is investigated. That is the performance of the neural 
network trained by ICA, PSO and GA are compared for wind power 
prediction. Figs. 5-8 show the results of the trained neural 
networks for the three cases. For comparison, the results for the 
method in [8] has been brought, as well. Besides, in Table 6 the 
validation indices i.e. MAE, RMSE, Modified_MAPE and Modified_- 
PAPE for both test and train data have been brought. As seen from 
these results, the proposed prediction engine performs superior 
with respect to the method in [8]. Among the three proposed 


160 


= te 
o os R bs 
> e > S 


Wind Power (MW) 
a 


1 L 1 1 1 1 1 L 


20 40 60 80 100 120 140 160 


Time, hours 


Fig. 5. The actual and forecasted wind power time series forecasted by hybrid NN 
for 1st test week of Alberta. 
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Fig. 6. The actual and forecasted wind power time series forecasted by hybrid NN 
for 2nd test week of Alberta. 


Table 6 
The properties and parameters of the employed optimization algorithms. 
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Fig. 7. The actual and forecasted wind power time series forecasted by hybrid NN 
for 3rd test week of Alberta. 
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Fig. 8. The actual and forecasted wind power time series forecasted by hybrid NN 
for 4th test week of Alberta. 
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Fig. 9. The convergence curve of the proposed hybrid neural networks for various 
optimization methods. 


ICA PSO GA 

Number of initial countries. 40 Population size (swarm size) 200 Population size 100 
Number of initial imperialists. 8 Personal learning coefficient (c1) 2 Crossover percentage 0.7 
Revolution rate 0.3 Global learning coefficient (c2) 2 Mutation percentage 0.2 
Assimilation coefficient (£) 2 Inertia weight damping ratio (w) 0.99 

Assimilation angle coefficient (y) 0.5 
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Table 7 
The comparison of the evaluation indices for different hybrid methods. 


Method Test weeks Correlation of test data MAE of test data 
ICA-NN 1 0.99493 3.4320 
2 0.98765 3.5459 
3 0.97657 4.7084 
4 0.98407 6.9516 
PSO-NN 1 0.98136 5.8490 
2 0.95684 5.9023 
3 0.94692 6.9452 
4 0.98296 7.2965 
GA-NN 1 0.98146 7.0851 
2 0.95513 7.0337 
3 0.93777 7.4237 
4 0.98040 8.2172 
The method in [8] 1 0.9791 6.9984 
2 0.9319 7.2740 
3 0.90547 8.7586 
4 0.9624 7.7107 


hybrid cases, the hybrid of ICA and NN shows the best perfor- 
mance with the lowest error indices. From convergence point of 
view, the methods have been compared in Fig. 9. That is, ICA in 
conjunction with neural network show the fastest convergence, 
while the neural network model trained by PSO is faster than the 
model trained by GA Table 7. 


6. Conclusions 


In this paper, accurate forecasting of wind power, as a key 
requirement to acquire proper performance of a wind farm has 
been considered. The desired wind power to forecast are the four 
seasonal wind power data sets of Alberta, Canada wind farm 
which are studied as the real data for model construction and 
evaluation. In order to synthesize an accurate model for wind 
power prediction, at first, the wind power time series behavior has 
been characterized via a powerful time series analysis method 
known as recurrence plot. Via this characterization, it is observed 
that the wind time series exhibit as stochastic signal with mimic 
predictability. The non-stationarity and seasonality of this time 
series are the other characteristics of the wind power. Based on the 
analysis results short-term forecasting of the wind time series has 
been considered via some hybrid optimized neural network 
models. Due to the mimic predictability of the time series the 
close past values of the time series which are highly correlated 
with the hourly wind power time series have been considered as 
the model inputs. Such correlation analyses has lead to selection of 
the wind power at most 1-6h before the desired as the neural 
network models' inputs. Next, the neural network model has been 
trained via three powerful optimization algorithms which are GA, 
PSO and ICA. The prediction results as well as the evaluation 
indices are representative of the out-performance of the hybrid 
model of neural network and ICA with respect to others. Low error 
indices and very fast convergence are the main properties of the 
hybrid ICA-neural network model. 
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